Auditing and Maintaining Provenance in Software Packages

نویسندگان

  • Quan Pham
  • Tanu Malik
  • Ian T. Foster
چکیده

Science projects are increasingly investing in computational reproducibility. Constructing software pipelines to demonstrate reproducibility is also becoming increasingly common. To aid the process of constructing pipelines, science project members often adopt reproducible methods and tools. CDE is a software packaging tool for reproducible research that encapsulates source code, datasets and environments. However, CDE is not amenable for creating software pipelines that demonstrate computational reproducibility. In this work, we propose software provenance to be included as part of CDE so that resulting provenanceincluded CDE packages can be easily used for creating software pipelines. We describe provenance attributes that must be included and how they can be efficiently stored in a light-weight CDE package. Further we show how a provenance in a package can be used for creating software pipelines and maintained as new packages are created. We experimentally evaluate the overhead of auditing and maintaining provenance and compare with heavy weight approaches for reproducibility such as virtualization. Our experiments indicate minimal overheads.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Transparent Web Service Auditing via Network Provenance Functions

Detecting and explaining the nature of attacks in distributed web services is often difficult – determining the nature of suspicious activity requires following the trail of an attacker through a chain of heterogeneous software components including load balancers, proxies, worker nodes, and storage services. Unfortunately, existing forensic solutions cannot provide the necessary context to link...

متن کامل

SPADE: Support for Provenance Auditing in Distributed Environments

SPADE is an open source software infrastructure for data provenance collection and management. The underlying data model used throughout the system is graph-based, consisting of vertices and directed edges that are modeled after the node and relationship types described in the Open Provenance Model. The system has been designed to decouple the collection, storage, and querying of provenance met...

متن کامل

Formal Foundations of Reenactment and Transaction Provenance

Provenance is essential for auditing, data debugging, understanding transformations, and many additional use cases. All these use cases would benefit from provenance for transactional updates. We present a provenance model for snapshot isolation transactions extending the semiring framework with version annotations and updates. Based on this model, we present the first solution for computing th...

متن کامل

Provenance-Based Auditing of Private Data Use

Across the world, organizations are required to comply with regulatory frameworks dictating how to manage personal information. Despite these, several cases of data leaks and exposition of private data to unauthorized recipients have been publicly and widely advertised. For authorities and system administrators to check compliance to regulations, auditing of private data processing becomes cruc...

متن کامل

Intermediate Notation for Provenance and Workflow Reproducibility

We present a technique to capture retrospective provenance across a number of tools in a statistical software suite. Our goal is to facilitate portability of processes between the tools to enhance usability and to support reproducibility. We describe an intermediate notation to aid runtime capture of provenance and demonstrate conversion to an executable and editable workflow. The notation is a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014